Revisiting Multi-Subject Random Effects in fMRI 



J.D. Rosenblatt^'*, M. Viiik'^ Y. Benjamin^'* 

"■Department of Statistics and Operations Research, The Sackler Faculty of Exact Sciences, 

Tel Aviv University, Israel 
^Rudolf Magnus Institute of Neuroscience, Department of Psychiatry, University Medical 
Center Utrecht, Utrecht, The Netherlands 



Abstract 

Random Effects analysis has been introduced into iMRI research in order to 
generalize findings from the study group to the whole population. Generalizing 
findings is obviously harder then detecting activation in the study group since 
in order to be significant, an activation has to be larger then the inter-subject 
variability. Indeed, detected regions are smaller when using random effect anal- 
ysis versus fixed effects. The statistical assumptions behind the classic random 
effects model are that the location-wise effect is normally distributed over sub- 
jects, and "activation" refers to a non-null mean effect. We argue this model 
is unrealistic and conservative compared to the true population variability. We 
propose a finite-Gaussian-mixture-random-effect, at each brain location, in or- 
der to capture brain plasticity and registration anomalies. This model suggests 
a natural quantification of "activation" using the prevalence of activation over 
subjects. We present the estimation of this prevalence, and discuss the testing 
problem of mixture alternatives (not the typical shift alternatives) . Intrestingly, 
we find the signed-rank test to be considerably more (Pitman) efficient than the 
classic group-level t-test. 

The end result of the proposed analysis is a map of the prevalence at locations 
with significant activation, highlighting the regions that are common to many. 
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1. Introduction and Motivation 

A typical cognitive fMRI study, entails the group-wise localization of brain 
regions with evoked responses to a given cognitive stimulus. Individual sta- 
tistical maps containing regression coefficients per voxel are combined across 
subjects to allow for group wise inference using "Random Effects Inference". 
This inference has become standard since it offers reproducible findings (Fris- 
ton et al. [8]). Its rationale is to compare the estimated effect to its variability 
over different replications with different subjects. If the observed effect is im- 
probable when assuming "no activation" and given the sampling variability- a 
location is declared "active". 

The statistical assumptions behind the classic random effects approach are: 
(a) Gaussianity: the voxel-wise effect is Gaussian distributed over subjects, (b) 
Shift alternative: "activation" refers to a non-null average effect, (c) Homogene- 
ity: at a fixed location, all subjects are either active or inactive. Unfortunately, 
these assumptions rarely hold. Large deviations from the Gaussianity assump- 
tion have been demonstrated in several large studies (eg. Thirion et al. \v\ and 
section [3] herein) . This would typically affect sensitivity but not specificity 
Assumption (b) is just a matter of convention: should locations where only 50% 
of subjects show activation should be called "active" or "inactive"? Finally, for 
assumption (c) to be violated, it suffice for voxels to contain different structural 
and/or functional information across subjects, which is indeed the case; As put 
in Thirion et al. "spatial mis-registration implies that at a given voxel, i.e. 
a given position in MNI space, some subjects have activity while other subjects 
do not...". Also in Fedorenko et al. Q|: "... activations land in similar but mostly 
non overlapping anatomical locations". The visual motion area (MT) has been 
noted to vary over individuals by more than 2 cm after Talaraich normalization. 
The primary visual cortex has also been noted to vary in size- up to two fold 
over different subjects [13i] . 
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Spatial smoothing of the signal is the typical solution for the above violations. 
It will spatially smear the signal so that between-subjects agreement is larger. 
It will also alleviate the Gaussianity assumption via the central limit theorem. 
Alas, spatial smoothing comes at the cost of spatial precision and does not 
address the model's assumptions. 

In this work we take a different path; Without spatially smoothing the para- 
metric maps, our model allows for voxels mapped to the same location to contain 
some proportion of both active and inactive individuals. The suggested model 
is both statistically realistic and explicitly allows for spatial disagreement over 
subjects- due to either brain plasticity or mis-registration. This amortizes the 
mis-registration effects while allowing to highlight regions of agreement over sub- 
jects with high spatial precision. We also argue that the proportion of the active 
sub population at each location ("prevalence") is a more interesting parameter 
than the mean activation. In particular when large samples are available and 
a significance test becomes non-informative, such as in Thyreau et al. [18]. In 
the following sections we try to formalize and justify the population-mixture as- 
sumption. Section [2] formalizes this intuition, which is applied to a large fMRI 
study in section [S] A discussion follows in section |4l 

2. Method 

2.1. Distribution of the Voxel-Wise Effects Over Subjects 

We propose a voxel-wise adaptation of classical random-effect model. Re- 
calling the random effect model: 



y,{t,v)^X,{t)l3,{v) + e,it,v) (2.1) 

Where yi{t, v) is the fMRI signal of subject i at time t in voxel v. The expected 
(unsealed) signal induced by a stimulus is denoted by Xi (t) and assumed known 
(see Worsley et al. 20| for details). Measurement error, intra-subject psycho- 
logical variability and unmodeled effects are captured by (t,v). The subject 
specific effect induced by a stimulus in voxel u, is denoted by (v) . As pre- 
viously mentioned, in the classical random effects model it is assumed to be 
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Gaussian distributed over subjects. I.e. Vi = l,...,n : f {Pi) — 4'p,,cr^ {Pi) — 



—7^ — w exp ( — a I • For the reasons described in the introduction, we will 
now allow it to be a mixture of two populations: The "inactive" population 
centred at zero and the "active" population with a non-null center. The model 
for the active population effects is as before. After omitting the voxel index v, 
the probability density function (PDF) of this mixture is given by: 



Where /i (.) — 5 {.) is the Dirac delta, < p < 1 and J2 {■) ~ 4>fi,cr^ {■) with 
the mean effect ^ allowed to be positive or negative like in the classical random 
effects setup. 

2.2. Toy Example 

To demonstrate that Eq. 12.21 captures brain plasticity and mis-registration 
consider the following example: All subjects have a simple signal in their two 
dimensional "brains" as depicted in figure I2.1I A . The signal is similar across 
subjects, but not identical, in the sense that different subjects are allowed to 
have differently ellipse-shaped signals- mimicking functional plasticity. The 
centres are also changing slightly between subjects, mimicking misregistration 
effects. Figure [5TTJ-B depicts the relative frequency of signal overlap over brains. 
After some noise is added, the prevalence of activation {p in eq. 12. 2p is now 
estimated from these simulated brains. The estimate map of p {SPM {p}) 
based on 100 perturbed and noised images are seen in figure I2.1I C. Figure 
12. II P is similar to figure I2.H -C except the noise added had much heavier tails 
demonstrating the robustness of the prevalence estimates to outliers. 

2.3. Estimation Using Self- Referential Task fMRI Data 

In classical random-effects analysis, one would estimate the parameters of 
interest, by deriving the marginal distribution of yi{t, v) mixed by / In the 
fMRI literature it is more common to use a two-level approach: first estimate 
the subject effect, and then estimate the random effect parameters (Mumford 




/(/3J = (l-p)/i(/3,)+p/2(/30 



(2.2) 
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Figure 2.1: Toy Signal in Two Dimensional "Brain". Figure A portrays the activation region 
in a single arbitrary "brain". Figure B portrays the true prevalence (relative frequency) of 
the activation at each location. Figure C portrays the estimated prevalence from 100 "brain 
scans" with Af{0, 0.25) noise. Figure D is the same as C but demonstrates the robustness of 
the prevalence estimates to outliers. In this case, generated with 0.88A/'(0, 0.15) + 0.12A/'(0, 1) 
which has the same signal to noise ratio but with heavier tails than C. representing outliers. 



ft^SMflt&' WaiJi AetiS^tjon Example B- Tftle AetlVation Prevalence 




C- Estimated Prevalence 



D- Robustness of Estimated Prevalence 



Figure 2.2: Distribution of Kolmogorov Smirnov test statistic over voxels, comparing different 
fitted distributions to data. To avoid over fitting, a train-test data spUt has been used. 
Boxplots are sorted by the median. 
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and Nichols [ill, "^t sd. [21]). These are known as the first and second level 
respectively. We adopt this approach for convenience, both mathematical and 
computational, but note this approach is still a matter of debate (Chen et al. 

Qi)- 

Under the two-level approach, f3i (v) are estimated and not observed di- 
rectly, so we have to allow for their measurement error. The second level effect 
distribution is still given by 12.21 but here /i (.) is not concentrated at zero. Af- 
ter considering various distributions for the inactive group- Gaussian, Cauchy, 
Laplace and Logistic- we have chosen a centred two-component-Gaussian-scale- 
mixture, which was also adopted in Woolrich {l9]. Figure [2?2l demonstrates the 
three component Gaussian mixture typically fits the data better than other 
candidate distributions. 

By redefining /i (.) = (pi^o.^f (•) +P20o,rT| i-)). Pa = P, and keeping 
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the standard /2 (.) ~ (t^^.a-^ (•) we get a three component mixture: 

/ = (13) + P2<jyo,al (/3) + P^^^,al [P) (2.3) 

Where pi^P2.P3 are the voxel-wise mixing proportions, naturally summing 
to 1, and 4>mean,variance are Gaussian PDFs allowed to have voxel-specific pa- 
rameters. Figure [221 demonstrates the fit of the whole mixture in several select 
locations. 

We are now left with the problem of estimating the parameters of Eq. 12.31 
(pi,P2,P3, cTi, cr|, We use the expectation-maximization algorithm (EM) 
to maximize the likelihood. We note that as in any mixture problem, identifica- 
tion problems arise. Even with the variances constrained so that (t\ < (t|, 
any two-component-mixture can be parametrized as (pi,p2, 0, •, uj, i7|, •) U 
(•,P2, 0, a^, a|, a^) U (pi, •, •, 0, a^, cr|, a|) where • denotes a free parame- 
ter. We alleviate this problem by constraining the parameter space; In order to 
allow the interpretation of as the prevalence, the values of p^ were "pushed" 
towards zero as /i vanishes: ps < 1 — exp ^—/xy^ ^^y"^^^^ "^ . The form of the 
constraint was chosen so that: (a) It forces — > as /i — >■ permitting p3 to be 
interpreted as the activation prevalence, (b) The constraint is less restrictive as 
the sample size increases, (c) The constraint is more restrictive as the variance 
of the null population increases. We note that in the vast majority of voxels, 
the constraint was non-binding. 

Once the prevalence has been estimated, the following obvious question is 
"could it be null?". The testing stage is a separate problem we will discuss only 
briefly and for which several solutions might be considered. See ? ] for some 
examples. In particular, we find a particular appeal in Wilcoxon's classic signed 
rank test; This for several reasons, (a) It is robust to model assumptions, (b) 
It is not necessarily dominated by the group-t-test if considering a mixture al- 
ternative, (c) It is easy to compute and interpret. In particular when compared 
to a generalized likelihood test to test for a mixture alternative. We return to 
this point with real fMRI data on our hands in section 13.21 
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Note that unlike the classical random-effect setup, this null is not tested 
against a shift alternative : p3 = 1; /i 7^ 0, (j| > O) , but rather against a 
mixture alternative (iJi : > 0;/i 7^ 0, > O). This is because we consider 
as "active" any location with a non-null prevalence. We initially considered a 
generalized likelihood ratio test. This path has been abandoned due to many 
mathematical and computational complexities (see Garel ^ for an overview). 
We have finally decided to use the Wilcoxon signed-rank statistic and this for 
several reasons: (a) It is robust to model assumptions, (b) It is sensitive to 
location and shape shifts- both present when considering mixture alternatives, 
(c) It is easy to compute and interpret, (d) It is more powerful than the group- 
t-test in our setup. We return to this point with real fMRI data on our hands 
in section [ 

3. Results 

The proposed model was used to analyze fMRI data of 64 subjects perform- 
ing a self-referential task make judgements about trait adjectives. See section 
16.11 for details. This is an unusually large study, offering the opportunity to 
validate the distributional assumptions presented. Note that the data has not 
been smoothed, except for some voxel blending due to the spatial normalization 
to the MNI template. We advocate the use of unsmoothed data to avoid the no- 
torious spatial smearing of the signal (e.g. Saxe et al. p^]) which compromises 
spatial accuracy. 

The SPM of the prevalence estimates is denoted SPM{p3} and demon- 
strated in figures 13.11 and I3.3I A. This estimate is compared to the standard 
second-level t-statistic depicted in I3.3I A. The boundaries of the activation re- 
gion exhibit a smooth decay of from 1 to (more noticeable in I3.3I -A) . This 
phenomenon has already been observed by others, albeit with different interpre- 
tation: "Deviation from normality of the effects... coincides with the boundaries 
of activated areas'-Thirion et al. 17|. Since the phenomenon is to be expected 
given our motivation we find this to be convincing evidence favouring our model 
where non-Gaussianity stems from sub-populations mixing (recall, no smoothing 
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Figure 3.1: Estimated signed-prevalence in real fMRI data. Showing the estimated prevalence 
with the sign of the effect. Masked at significant (prevalencoO) locations using the signed- 
rank test statistic with FDR control using BH at FDR<0.05. Prevalence contour lines were 
added to help visualize the shape of the activation regions. 
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has been applied to the data) . Also note that the change in prevalence happens 
at different rates across the image which excludes voxel blending as a cause for 
the smooth decay in prevalence. To further justify the mixture assumption, in 
figure we examine the effect estimates at several select locations which in- 
deed demonstrate the non Gaussian nature of the data. Figure [^21 demonstrates 
the mixture's better fit is not limited to just some select locations, but rather 
occurs (on average) over the whole brain volume. We are thus confident that 
our mixture model seems more appropriate for the data we encounter than the 
single Gaussian underlying the usual random-effects analysis. 
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Figure 3.2: Distribution of Effect in Selected Locations: A density plot of the second-level 
effect distribution (solid gray line) along with the fitted mixture (solid black line) and its two 
weighted inactive components (dotted lines) and a third weighted active component (dash- 
dotted line). The mean of the active sub population is denoted with a vertical dashed line. 
Group t-statistic are included. The figures demonstrate the fit of the three component mixture 
to the second level effects at four locations referencing figure [3731 
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3.1. Interpreting SPMfprevalence} 
Figures O 

and 13.31 depict the estimated prevalence map. A higher prevalence means 
more people (in the population) show activation at that location. In particular, 
this says nothing about the magnitude of the activation (when present) depicted 
in SPM{fi} . High prevalence might be accompanied by high magnitudes of 
effect such as in mark 1 in fig. 13.21 and 13.31 This is the simplest pattern of 
"activation". High prevalence might come with small effects (mark 2), which 
might be seen as statistical artifact, or indeed, as a prevalent small effect. The 
case of large signal with small prevalence should probably be interpreted as 
an outlier (mark 3). The t-statistic generally capture the existence of signal, 
but note that large variability around zero, might mask the existence of a non 
centred group such as in mark 4. 

An example of these phenomena, can be seen in figure [3T] In particular note 
the "boot" shaped (significant) activation region at coordinates x « 100, y « 150. 
This region is also apparent in the t-maps in figure 13.31 C, albeit it is less 
apparent due to what is probably a small subset of distinct (not to say "outliers") 
subjects. Note the interesting spatial asymmetry of this region is completely 
masked in the standard smoothed SPM {t} in figure [3T3l D. 

In summary, the prevalence estimate and the classical t-statistic are related, 
but capture different aspects of the activation pattern. We think that the preva- 
lence is the relevant measure of interest and should be in a researcher's tool-set. 

3.2. Group-T versus Group- Wilcoxon; Power Gonsiderations 

We have previously stated the Wilcoxon test should be preferred over the 
group-t-tcst for the localization problem. To see this last point we first note 
that more voxels have been found active; 11,817/27,401 using Wilcoxon versus 
11,037/27,401 using the group-t-test. More importantly, the (Pitman) asymp- 
totic relative efficiency of the two test statistics can be computed. We compute 
it using the average value of the nuisance parameters in the current study and 
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Figure 3.3: Maps of prevalence (A) and effects (B) compared with standard smoothed (D) 
and non-smoothed (C) second-level t-maps. The distribution of contrasts over individuals and 
value of t-statistic, in marked locations (1-4), can be seen in figure |3^ 
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find it to be ewiicox,T = 0.14. That is, the Wilcoxon test is (asymptotically) 
about seven times more efficient when testing for such mixture alternatives. 

4. Discussion 

Much of the neuroscientific literature is devoted to the localization of brain 
activation. Little attention is given to the magnitude of the mean effect at active 
locations. This is no surprise given that the magnitude of the effect is variable 
even over different sessions for the same subject (Raemaekers et al. [13 )■ The 
suggested mixture-model approach admits a natural and intuitive quantification 
of the level of activation at a location. Not by its strength, but rather by 
its prevalence. The typical active /inactive qualification, is an instance of the 
suggested model, when G {0, 1}. This estimation approach is particularly 
appropriate in large samples where prevalence estimators have low variance and 
significance tests are non informative and trivially rejected such as in Thyreau 
et al. Q. 

Another gain is spatial precision, noticeable when visualizing the prevalence 
maps. As seen in figure [373l A compared to l3.3f C. the prevalence maps sharper 
the common regions of activation across subjects in comparison to the common 
t-maps. 

4.1. Stability 

To verify the stability of our findings, we compared the agreement in the 
activation region using two statistics: the t-statistic and our prevalence estimate. 
As seen in figure |4T] the difference in the stability of activation regions defined 
by the two test statistics is negligible. 

4-2. Related Ideas 

The concept of "prevalence" of activation is not a new one. In Friston et al. 
0| the authors discuss how the use of conjunction hypotheses could allow to 
infer on the population without the explicit distributional assumptions in the 
random effect approach. The "number of subjects in a population showing the 
effect" denoted by 7 in their Eq. (1), is precisely the prevalence discussed in 
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Figure 4.1: As the activation threshold of each statistic varies, a larger proportion of the 
brain is declared active (x axis). For each such proportion, we check the proportion of spatial 
agreement in activation over two data splits (y axis). As can be seen, the difference in stability 
between activation defined using the t-statistic and the prevalence estimates, are negligible. 




this paper. A test for "at least u out of n" active subjects can be seen as a 
"testimator" of this prevalence. This is indeed the approach taken in Heller 
et al. Q 

The use of finite mixtures in the context of fMRI has also been suggested. 
Xu et al. [21]. motivated by the artifacts of spatial smoothing, recur to a fi- 
nite Gaussian mixture to model variability between subjects. The number of 
components is however random and their weights depend on their proximity 
to an "activation center". The spatial distribution of these activation centres 
is constructed as a multilevel point process. This construct allows to localize 
both a subject's activation centres and the group activation centres. It also 
admits a concept of "prevalence" albeit somewhat more complicated than the 
one presented here. 

The finite Gaussian mixture also appears in Woolrich jl^. In which the 
mixture is motivated by heavier-than-Gaussian tails of the effect distribution. 
The author uses a scale mixture to capture outliers in the effect distribution. 
The author does hint to the use of a "mean-shift model", but again, only for the 
purpose of capturing outliers and not as a distinct sub population. We have 
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indeed adopted the Gaussian scale mixture as the null population model. We 
empirically found it to have a good fit. 
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6. Appendix A- Technical Details 

6.1. Data 

Data from 64 subjects (30 male, 34 female, mean age 30.3 +/- 6.5 SD years). 
These data were acquired at the University Medical Center Utrecht as part of 
a larger study (Zandbelt et al. [22], van Bimren et al. . All subjects were 

right-handed. 

Subjects performed a self-referential task. In short, subjects had to make 
judgements about trait adjectives (for example 'lazy') in relation to themselves 
(Self condition), to someone else (Other condition), or they had to indicate 
whether this trait was socially desirable (Control condition). Conditions were 
presented in five separate blocks of eight trials (28s) each, alternated with rest 
periods of 30s. Total task duration was about lOmin 32s fMRI measurements 
All imaging was performed on a Philips 3.0T Achieva whole-body MRI scanner. 
Functional images were obtained using a 2D-EPI-SENSE sequence with the 
following parameters: voxel size 4 mm isotropic; TR= 1600 ms; TE = 23 ms; 
flip angle = 72.5°; matrix 52x30x64; field of view 208x120x256; 30-slice volume; 
SENSE-factor R=2.4 (anterior-posterior). A total of 395 functional images were 
acquired during the self-reflection task. After the acquisition of the functional 
images, an 3D Fast Field Echo (FFE) Tl-weighted structural image of the whole 
brain was made (scan parameters: voxel size 1 mm isotropic, TR = 25 ms; TE 
= 2.4 ms; flip angle = 30°; field of view 256x150x204, 150 slices). 
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fMRI preprocessing and analysis Image preprocessing and analyses were car- 



ried out with SPM5 (http://www.fil.ion.ucl.ac.uk/spm/). After realignment, 
the structural scan was co-registered to the mean functional scan. Next, us- 
ing unified segmentation the structural scan was segmented and normalization 
parameters were estimated. Subsequently, all scans were registered to a MNI 
Tl-standard brain using these normalization parameters and a 3D Gaussian 
filter (8-mm full width at half maximum) was applied to all functional images. 
The preprocessed functional images were submitted to a general linear model 
(GLM) regression analysis. The design matrix contained factors modelling the 
onsets of the Self, Other and Control condition as well as the instructions that 
were presented during the task. These factors were convolved with a canoni- 
cal hemodynamic response function (Friston et al., 1995). To correct for head 
motion, the six realignment parameters were included in the design matrix as 
regressors of no interest. A high-pass filter was applied to the data with a cut-off 
frequency of 0.0055 Hz to correct for drifts in the signal. For the second-level 
analysis, we used the self condition versus baseline (rest) contrast. 

6.2. Statistical Method 

As previously mentioned, estimation was performed by maximizing the like- 
lihood using an EM algorithm. A major concern when solving several tens-of- 
thousands of EM problems, is speed, which is largely affected by the initializa- 
tion values. Moment estimators are typical initialization values, but having six 
nonlinear moment equations these are hard to find. We thus employ a hybrid 
solution, in which we search over a grid of (pi,p2) values, solve the four moment 
equations given (pi,p2), and keep the highest likelihood value combinations as 
initialization values for the EM. This initialization heuristic allowed considerable 
speed gains during estimation. 

Implementation was done in the R programming environment (Team [15|). 
The described estimation procedure has been implemented in the R pac kag e 
FPF (fMRI Prevalence Finder) available from R- Forge (Theufil and Zeileis |16| ) 



at http: //rosenblattl .r- forge . r-project . org/[ See the package's in-line 
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help for details. 
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